[BFCL] Add dynamic max_token handling for locally hosted models #693

yutongxie58 · 2024-10-14T04:52:46Z

This PR adds a get_max_tokens function that dynamically sets the max_tokens limit based on the model name. It supports different models like Llama, GLM, Phi, Hermes, etc. The logic is based on model families and their typical token limits. Tested locally with various model names.

HuanzhiMao

Hey @yutongxie58,

Thanks for the PR and welcome!

According to OpenAI's spec, the max_token for the chat.completion endpoint is the maximum number of tokens that can be generated. It does not include the input token count. For example, if a model has a context length of 4096, our input message takes1000 tokens and you set the max_token to 4096, then that would error because the total number of tokens (1000 in input and 4096 requesed for output) exceeds the model's context window length. So what we want to do is that, before call the chat.completion endpoint, use the model's tokenizer (each model has its own tokenizer) to figure out how many tokens the input message formatted_prompt has used, subtract that amount from the model's maximum context length and supply that value for the max_token argument so that we will never get the maximum length exceeded error. In short, we want to allow the model to generate as many as possible till the limit of its context length.

Let me know if this makes sense.

Added dynamic max_token handling based on model names

4ed006d

HuanzhiMao requested changes Oct 14, 2024

View reviewed changes

Merge branch 'main' into dynamic-max-tokens

e88001c

HuanzhiMao added the BFCL-General General BFCL Issue label Oct 14, 2024

HuanzhiMao changed the title ~~Add dynamic max_token handling for locally hosted models~~ [BFCL] Add dynamic max_token handling for locally hosted models Oct 14, 2024

yutongxie58 closed this by deleting the head repository Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BFCL] Add dynamic max_token handling for locally hosted models #693

[BFCL] Add dynamic max_token handling for locally hosted models #693

yutongxie58 commented Oct 14, 2024

HuanzhiMao left a comment •

edited

Loading

[BFCL] Add dynamic max_token handling for locally hosted models #693

[BFCL] Add dynamic max_token handling for locally hosted models #693

Conversation

yutongxie58 commented Oct 14, 2024

HuanzhiMao left a comment • edited Loading

Choose a reason for hiding this comment

HuanzhiMao left a comment •

edited

Loading